PCA vignette Principal components analysis with snpStats

نویسنده

  • David Clayton
چکیده

Usually, principal components analysis is carried out by calculating the eigenvalues and eigenvectors of the correlation matrix. With N cases and P variables, if we write X for the N × P matrix which has been standardised so that columns have zero mean and unit standard deviation, we find the eigenvalues and eigenvectors of the P × P matrix X.X (which is N or (N − 1) times the correlation matrix depending on which denominator was used when calculating standard deviations). The first eigenvector gives the loadings of each variable in the first principal component, the second eigenvector gives the loadings in the second component, and so on. Writing the first C component loadings as columns of the P ×C matrix B, the N×C matrix of subjects’ principal component scores, S, is obtained by applying the factor loadings to the original data matrix, i.e. S = X.B. The sum of squares and products matrix, S.S = D, is diagonal with elements equal to the first C eigenvalues of the X.X matrix, so that the variances of the principal components can obtained by dividing the eigenvalues by N or (N − 1). This standard method is rarely feasible for genome-wide data since P is very large indeed and calculating the eigenvectors of X.X becomes impossibly onerous. However, the calculations can also be carried out by calculating the eigenvalues and eigenvectors of the N × N matrix X.X. The (non-zero) eigenvalues of this matrix are the same as those of X.X, and its eigenvectors are proportional to the principal component scores defined above; writing the first C eigenvectors of X.X as the columns of the N × C matrix, U , then U = S.D−1/2. Since for many purposes we are not too concerned about the scaling of the principal components, it will often be acceptable to use the eigenvectors, U , in place of the more conventionally scaled principal components. However some attention should be paid to the corresponding eigenvalues since, as noted above, these are proportional to the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perception of drug addiction among Turkish university students: causes, cures, and attitudes.

In this paper, university students' beliefs about different causes of drug addiction and cures for it were investigated. Principal component analysis (PCA) with Causes of Drug Abuse Scale (CADAS) revealed four components: problems and coping, sensation seeking, social environment, and disposition. PCA with Cures for Drug Abuse Scale (CUDAS) produced four components: help seeking and avoidance, ...

متن کامل

Analysis of physiochemical and microbial quality of waters of the Karkheh River in southwestern Iran using multivariate statistical methods

Rapid population growth as well as agricultural and industrial development have increased the contamination of Iranian rivers. This study utilized principal components analysis (PCA) to determine the degree of significance of qualitative parameters of water resources in the Karkheh River in southwestern Iran. Cluster analysis (CA) grouped the monitoring stations based on the water quality data ...

متن کامل

A ‎n‎ew weighting approach to Non-Parametric composite indices compared with principal components analysis‎

Introduction of Human Development Index (HDI) by UNDP in early 1990 followed a surge in use of non-parametric and parametric indices for measurement and comparison of countries performance in development, globalization, competition, well-being and etc. The HDI is a composite index of three indicators. Its components are to reflect three major dimensions of human development: longevity, knowledg...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016